## Compiler Design

#### Samit Biswas

samit@cs.iiests.ac.in



Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur

November 12, 2017



### **Code Generation**

#### **Code Generation:**

- The final phase of a compiler is code generator.
- It receives an intermediate representation (IR) with supplementary information in symbol table.
- Produces semantically equivalent target program.
- Code generators Main Tasks:
  - Instruction selection.
  - Register allocation and selection.
  - Instruction Ordering.

#### **Code Generation Issues**

- The most important criterion is that it produces correct target code.
- Input to code generator
  - ► IR + Symbol Table.
  - We assume that front-end produces low level IR; i.e. values in it can be directly manipulated by the machine instructions.
- The target Program
  - Prerequisite: target machine and its instruction set.

#### Instruction Selection

Every three address statement of the form:

$$X = Y + Z$$

translated into the following code sequences:

```
MOV y, R0 /* load y into register R0 */
ADD z, R0 /* add z to R0 */
MOV R0, x /* store R0 to x */
```

## The desired quality of target code:

$$a = b + c$$
  
 $d = a + e$ 

#### would be translated into:

MOV b, R0 ADD c, R0 MOV R0, a MOV a, R0 ADD e, R0 MOV R0, d

## A Simple Target Machine Model

- Byte addressable machine with four bytes to a word.
- ▶ n general purpose registers, R0, R1, ..., Rn 1
- It has the address instruction of the form

op source destination

op is an op-code; source and destination are data field.

- It has the following op-codes (among others)
  - MOV (move source to destination)
  - ADD (add source to destination)
  - SUB (subtract source to destination)

Table: Address modes

| Mode              | Form  | Address                   |
|-------------------|-------|---------------------------|
| absolute          | М     | M                         |
| register          | R     | R                         |
| indexed           | c(R)  | c + contents(R)           |
| indirect register | *R    | contents(R)               |
| indirect indexed  | *c(R) | contents(c + contents(R)) |

*contents*(*a*) denotes the contents of register or memory address represented by a.

# **Assignment Statement:** d = (a - b) + (a - c) + (a - c) might be translated into the following TAC:

t = a - b u = a - c v = t + ud = v + u

with d live at end.

#### Generated Code:

| Statements | Code       | Register       | Address            |
|------------|------------|----------------|--------------------|
|            | Generated  | Descriptor     | Descriptor         |
|            |            | Register empty |                    |
| t = a - b  | MOV a, R0  | R0 Contains t  | t in R0            |
|            | SUB b, R0  |                |                    |
| u = a - c  | MOV a, R1  | R0 contains t  | t in R0            |
|            | SUB c, R1  | R1 contains u  | u in R1            |
| v = t + u  | ADD R1, R0 | R0 contains v  | t in R0            |
|            |            | R1 contains u  | v in R0            |
| d = v + u  | ADD R1, R0 | R0 contains d  | d in R0            |
|            | MOV R0, d  |                | d in R0 and Memory |

## **Code Sequences for indexed assignments**

Indexing operations in three address statements are handled in the same manner as binary operations. Consider the following indexed assignments:

$$a = b[i]$$
  
 $a[i] = b$ 

Code Sequences for indexed assignments:

| Statement | i in Register R <sub>i</sub> | i in Memory M <sub>i</sub> |
|-----------|------------------------------|----------------------------|
|           | Code                         | Code                       |
| a = b[i]  | MOV b(Ri), R                 | MOV Mi, R                  |
|           |                              | MOV b(R), R                |
| a[i] = b  | MOV b, a(Ri)                 | MOV Mi, R                  |
|           |                              | MOV b, a(R)                |

## **Code Sequences for pointer assignments**

Consider the following pointer assignments:

$$a = *p$$
  
 $*p = a$ 

| Statement | p in Register Rp       | p in Memory M <sub>p</sub> |
|-----------|------------------------|----------------------------|
|           | Code                   | Code                       |
| a = *p    | $MOV *R_p$ , a         | MOV Mp, R                  |
|           | ·                      | MOV *R, R                  |
| *p = a    | MOV a, ∗R <sub>p</sub> | MOV Mp, R                  |
|           | ,                      | MOV a, *R                  |

## **Code Sequences for Conditional Statements**

- For branch instruction set a condition code to to indicate whether the last quantity computed or loaded into a register is negative, zero or positive.
- Compare instruction (CMP in our case) has the desirable property that it set the condition code without actually computing a value.
- ▶ That is *CMP* x, y sets a condition code to positive *if* x > y, and so on. a conditional-jump machine instruction makes the jump if a designated condition  $<, =, >, \leq, \neq, \geq$  is met.

if 
$$x < y$$
 goto z

#### This can be translated to

```
CMP x, y CJ< z /*jump if the condition code is negative or zero*/
```



## **Example Conditional Statement**

$$x = y + z$$
  
if  $x < 0$  goto z

#### After translation:

The condition code was determined by x after **ADD z**, **R0**.

## **Basic Block and Flow Graphs**

- A graph representation of TAC called a flow graph.
- Nodes in flow graph represent computations.
- The edges represents the flow of controls.
- Some register assignment algorithms use flow graphs to find the inner loops where a program is expected to spend most of its time.

## **Basic Block and Flow Graphs**

- Basic Block: A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end.
- A name in a basic block is said to be live at a given point if its value is used after that point in the program, perhaps in another basic block.

## Algorithm: Partition into Basic Block

Input: A sequence of Three Address Statement.

Output: A list of Basic Blocks with each TAC in exactly one block.

#### Method:

- Determine the leaders, the first statements of Basic blocks.
  - i) The first statement is a leader.
  - ii) Any statement that is the target of a conditional or unconditional goto is a leader.
  - iii) Any statement that immediately follows a goto or conditional goto statement is a leader.
- 2.For each leader, its basic block consists of the leader and all statements up to but not including the next leader or the end of the program.

Consider the following code fragment - It computes the dot product of two vectors *a* and *b* of length 20.

```
begin

prod := 0;
i := 1;
do begin

prod := prod + a[i] * b[i];
i := i + 1
end
while i <= 20
end</pre>
```

## TAC to computing the dot product:

```
(1) prod := 0

(2) i := 1

(3) t<sub>1</sub> := 4 * i

(4) t<sub>2</sub> := a [ t<sub>1</sub> ] /* compute a[i] */

(5) t<sub>3</sub> := 4 * i

(6) t<sub>4</sub> := b [ t<sub>3</sub> ] /* compute b[i] */

(7) t<sub>5</sub> := t<sub>2</sub> * t<sub>4</sub>

(8) t<sub>6</sub> := prod + t<sub>5</sub>

(9) prod := t<sub>6</sub>

(10) t<sub>7</sub> := i + 1

(11) i := t<sub>7</sub>

(12) if i <= 20 goto (3)
```

## Flow graph for the program:



## **Register and address Descriptors**

- A register descriptor keeps track of what is currently in each register. It is consulted whenever a new register is needed.
- An address descriptor keeps track of the location where the current value of the name can be found at runtime.

## A Code Generation Algorithm

The Code generation Algorithm takes as input a sequence of three-address statements constituting a basic block. For each three address statement (x = y op z) perform the following:

- 1. Invoke a function *getreg* to determine the location L where the result of the computation y op z should be stored.
- Consult address descriptor. If the value of y is not already in L generate instruction MOV y, L to place a copy of y in L.
- Generate the instruction OP z, L. If L is a register, update descriptor to indicate that it contains the value of x, and remove x from all other register descriptors.
- If the current values of y and / or z have no next uses, they are not live on exit from the block. Alter the register descriptor.

## The function getreg

*getreg* returns the location L to hold the value of x for the assignment (x = y op z).

- If the name y is in a register that holds the value of no other names and If the y is not Live and has no next use after execution of (x = y op z) then return the register of y for L. Update the address descriptor of y that y is no longer in L.
- 2. Failing (1), return an empty register for L if there is one.
- 3. Failing (2), find an occupied register R, to be get freed.
- 4. If x is not used in the block, or no suitable occupied register can be found select memory location of x as L.